Mining Incomplete Data with Many Missing Attribute Values A Comparison of Probabilistic and Rough Set Approaches
نویسندگان
چکیده
In this paper, we study probabilistic and rough set approaches to missing attribute values. Probabilistic approaches are based on imputation, a missing attribute value is replaced either by the most probable known attribute value or by the most probable attribute value restricted to a concept. In this paper, in a rough set approach to missing attribute values we consider two interpretations of such value: lost and “do not care”. Additionally, we apply three definitions of approximations (singleton, subset and concept) and use an additional parameter called α. Our main objective was to compare probabilistic and rough set approaches to missing attribute values for incomplete data sets with many missing attribute values. We conducted experiments on six incomplete data sets with as many missing attribute values as possible. In these data sets an additional incremental replacement of known values by missing attribute values resulted with the entire records filled with only missing attribute values. Rough set approaches were better for five data sets, for one data set probabilistic approach was more successful. Keywords-Data mining; probabilistic approaches to missing attribute values; rough set theory; probabilistic approximations; parameterized approximations
منابع مشابه
A comparison of traditional and rough set approaches to missing attribute values in data mining
Real-life data sets are often incomplete, i.e., some attribute values are missing. In this paper we compare traditional, frequently used methods of handling missing attribute values, which are based on preprocessing, with another class of methods dealing with missing attribute values in which rule induction is performed directly on incomplete data sets, i.e., handling missing attribute values a...
متن کاملA Comparative Study on Decision Rule Induction for incomplete data using Rough Set and Random Tree Approaches
Handling missing attribute values is the greatest challenging process in data analysis. There are so many approaches that can be adopted to handle the missing attributes. In this paper, a comparative analysis is made of an incomplete dataset for future prediction using rough set approach and random tree generation in data mining. The result of simple classification technique (using random tree ...
متن کاملA Rough Set Approach for Generation and Validation of Rules for Missing Attribute
Data mining has emerged as most significant and continuously evolving field of research because of it‘s ever growing and far reaching applications into various areas such as medical, military, financial markets, banking etc. One of the most useful applications of data mining is extracting significant and earlier unknown knowledge from real-world databases. This knowledge may be in the form of r...
متن کاملA Rough Set Model Based on Probabilistic Similarity Measure for Incomplete Decision Tables
Rough set models in incomplete decision tables have been discussed so far. Numerous approaches to deal with missing values in incomplete information systems have been proposed. In this paper, assuming that the domain of attribute values is defined, we apply the probability of values appearing in data tables in order to measure the self-information of similarity. This is defined as the uncertain...
متن کاملMining from incomplete quantitative data by fuzzy rough sets
Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, t...
متن کامل